Tape-Disk Join Strategies under Disk Contention

نویسندگان

  • Achim Kraiss
  • Peter Muth
  • Michael Gillmann
چکیده

Large-scale data warehousing, data mining, and scientific applications require the analysis of terabytes of facts data accumulated over long periods of time. Tape libraries are suitable devices for storing such mass data. The online analytical processing (OLAP) of this data typically leads to long-running aggregation queries joining the tape-resident facts relation with disk-resident dimension relations. Typically, during the execution of the join, the disks storing the dimension relations are not dedicated to the join. They are subject of reads and writes invoked by concurrently running applications. In many cases, it is desirable that the performance of these concurrent applications must not be degraded too much by the processing of the join. In this paper, we present an accurate model for analyzing the performance of three different tape-disk join strategies in multi-query systems like database or OLAP servers. The major contributions of this paper are (a) a detailed cost model considering tape and disk bandwidth, tape and disk latencies, available buffer sizes, CPU costs, and the selectivity of filters on tape data, (b) the consideration of disk queueing effects due to concurrent reads and writes at the disk, and (c) the consideration of two different disk scheduling strategies. Based on the analytical model, we show the superiority of a disk scheduling strategy giving preference to the service of the concurrent disk load. Furthermore, we present a strategy for dynamically selecting the most beneficial join algorithm and its parameters at runtime. We have implemented the tape-disk join strategies in a prototype system based on detailed simulations of secondary and tertiary storage devices. Our experimental evaluations confirm that the analytical model is indeed very accurate and a suitable basis for run-time strategy decisions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Disk Scheduling for Intermediate Results of Large Join Queries in Shared-Disk Parallel Database Systems

In shared-disk database systems, disk access has to be scheduled properly to avoid unnecessary contention between processors. The first part of this report studies the allocation of intermediate results of join queries (buckets) on disk and derives heuristics to determine the number of processing nodes and disks to employ. Using an analytical model, we show that declustering should be applied e...

متن کامل

Disk { Tape Joins : Synchronizing Disk and Tape

Today large amounts of data are stored on tertiary storage media such as magnetic tapes and optical disks. DBMSs typically operate only on magnetic disks since they know how to maneuver disks and how to optimize accesses on them. Tertiary devices present a problem for DBMSs since these devices have dismountable media and have very diier-ent operational characteristics compared to magnetic disks...

متن کامل

Relational Joins for Data on Tertiary Storage

Despite the steady decrease in secondary storage prices, the data storage requirements of many organizations cannot be met economically using secondary storage alone. Tertiary storage offers a lower-cost alternative but is viewed as a second-class citizen in many systems. For instance, the typical solution in bringing tertiary-resident data under the control of a DBMS is to use operating system...

متن کامل

Skew-Insensitive Join Processing in Shared-Disk Database Systems

Skew effects are still a significant problem for efficient query processing in parallel database systems. Especially in shared-nothing environments, this problem is aggravated by the substantial cost of data redistribution. Shared-disk systems, on the other hand, promise much higher flexibility in the distribution of workload among processing nodes because all input data can be accessed by any ...

متن کامل

Virtual Tape Libraries: The Best of Tape and Disk Backup

Tape backup has traditionally been the mainstay of enterprise data protection when long-term data protection is required. As disk technologies have improved and economies of scale have driven down prices, disk adoption rates have surged and it would appear disk is poised to eclipse tape as the dominant backup platform and relegate tape to a minor archival and disaster recovery role. Numerous su...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999